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£1^ Abstract. We state and analyze a generalization of the "truncation trick" 

O , suggested by Gourdon and Sebah to improve the performance of power series 

^0 ■ evaluation by binary splitting. It follows from our analysis that the values of 

D-finite functions (i.e., functions described as solutions of linear differential 
equations with polynomial coefficients) may be computed with error bounded 
by 2~P in time O(p(lgp) 3+O ( 1 >) and space 0(p). The standard fast algorithm 
for this task, due to Chudnovsky and Chudnovsky, achieves the same time 
complexity bound but requires ©(plgp) bits of memory. 

1. Introduction 

Binary splitting is a well-known and widely applicable technique for the fast 
multiple precision numerical evaluation of rational series. For any series ^ n s n 
with limsup n Isnl 1 ^" < 1 whose terms s n obey a linear recurrence relation with 
polynomial coefficients, e.g., 

ln2 = ^s„, s n = i)on+i ' 2(n + 2)s n+ i- (n + l)s n =0, 
n=o ^ n ' 

the binary splitting algorithm allows one to compute the partial sum Y^n=o s ™ 
in 0(M(7V(lgjV) 2 )) bit p era tions [3]. Here M(n) stands for the complexity 
of multiple precision integer multiplication, and lg denotes the binary logarithm. 
As N — 0(p) terms of the series are enough to make the approximation error 
less than 2~ p , the complexity of the algorithm is softly linear in the precision p, 
assuming M(n) = 0(n(lgn)°^). 

Methods based on binary splitting tend to be favored in practice even in cases 
when asymptotically faster algorithms (typically AGM iterations [5]) would apply. 
One high-profile example is the computation of billions of digits of classical con- 
stants such as 7r, C(3) or 7. Basically all record computation in recent years were 
achieved by evaluating suitable series using variants of binary splitting [51 HE] • 

A drawback of the classical binary splitting algorithm, both from the complexity 
point of view and in practice, is its comparatively large memory usage. Indeed, the 
algorithm amounts to the computation of a product tree of matrices derived from 
the recurrence — see Sect. |3] below for details. The intermediate results are matrices 
of rational numbers whose bit sizes roughly double from one level to the next. Near 
the root, their sizes can (and in general do) reach O(plgp), even though the output 
has size @{p). 

However, the space complexity can be lowered to O(p) using a slight variation 
of the classical algorithm. The basic idea is to truncate the intermediate results 
to a precision 0(p) when they start taking up more space than the final result. 
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Of course, these truncations introduce errors. To make the trick into a genuine 
algorithm, we need to analyze the errors, add a suitable number of "guard digits" 
at each step and check that the space and time complexities of the resulting process 
stay within the expected bounds. 

The opportunity to improve the practical behavior of binary splitting using trun- 
cations has been noticed by authors of implementations on several occasions over 
the last decade or so. Gourdon and Sebah [TO] describe truncation as a "crucial" 
optimization. Besides the expected drop of memory usage, they report running 
time improvements by an "appreciable" constant factor. Cheng et al. [3] compare 
truncation with alternative (less widely applicable but sometimes more efficient) 
approaches. Most recently, Kreckel [2] explicitly asks how to make sure that the 
new roundoff errors do not affect the correctness of the result. 

Indeed, the above-mentioned error analysis did not appear in the literature until 
very recently. An article by Yakhontov [26, 27] now provides the required bounds 
in the case of the generalized hypergeometric series p F q , which covers all examples 
where the truncation trick had been used before. But the applicability of the 
method is actually much wider. 

The purpose of this note is to present a more general and arguably simpler analy- 
sis. Our version is more general in two main respects. First, besides hypergeometric 
series, it applies to the solutions of linear ordinary differential equations with ra- 
tional coefficients, also known as D-finite (or holonomic) series |2T]. D-finite series 
are exactly those whose coefficients obey a linear recurrence relation with rational 
coefficients, while hypergeometric series correspond to recurrences of the first order. 
Second, we take into account the coefficient size of the recurrence that generates 
the series to be computed. Allowing the size of the coefficients to vary with the 
target precision p makes it possible to use the modified binary splitting procedure 
as part of the "bit burst" algorithm [5 to handle evaluations at general real or 
complex points approximated by rationals of size O(p). 

Additionally, our analysis readily adapts to other applications of binary splitting. 
The simplicity and generality of the proof are direct consequences of viewing the 
algorithm primarily as the computation of a product tree. See Gosper [5] and 
Bernstein [TJ §12-16] for further comments on this point of view. 

The remainder of this note is organized as follows. Section [2 contains some 
notations and assumptions. In Sect. [3] we recall the standard binary splitting 
algorithm, which will serve as a subroutine in the linear-space version. Then, in 
Sect. 21 we state and analyze the "truncated" variant that achieves the linear space 
complexity for general D-finite functions. Finally, Sect. [5] offers a few comments 
on other variants of the binary splitting method and possible extensions of the 
analysis. 

2. Setting 

The performance of the binary splitting algorithm crucially depends on that of 
integer multiplication. Following common usage, we denote by M(n) a bound on the 
time needed to multiply two integers of at most n bits. Currently the best theoret- 
ical bound [7J is M(n) = 0(n(lgn) exp 0(lg* n)), where lg* n = min{fclg ofc n < 1}. 
In practice, implementations such as GMP [TT] use variants of the Schonhage- 
Strassen algorithm of complexity 0(n(lgn)(lglgn)). We make the usual assump- 
tion [25] that the function n H> M(n)/n is nondecreasing. It follows that M(n) + 
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M(to) ^ M(n + to). We also assume that the space complexity of integer multipli- 
cation is linear, which is true for the standard algorithms. 

Write K = Q(i), and define the bit size of a number [x + iy)/w G K (where 
w,x,y € Z) as |~lgw] + \lgx] + fig j/] + 1. Consider a linear differential equation 
with coefficients in K(z). ft will prove convenient to clear all denominators (both 
polynomial and integer) and multiply the equation by a power of z to write it as 



Let s = maxfc dega^, and let hi denote the maximum bit size of the coefficients of 
the dfc. Although our complexity estimates depend on r and hi, we do not consider 
more general dependencies on the equation. Thus, the afc are assumed to vary only 
in ways that can be described in terms of these two parameters. Specifically, we 
assume that s — 0(1) and that the coefficients of ak(z)/a r (0) are all restricted to 
some bounded domain. 

We also assume that is an ordinary (i.e. nonsingular) point of ([T]). This implies 
that a r (0) 7^ and s r. The case of regular singular points (those for which we 
still have a r (Q) ^ but possibly s < r [I3J Chap. 9]) is actually similar [531 [T7]; we 
focus on ordinary points to avoid cumbersome notations. 

Let p = min{|z| : a r (z) — 0} € (0,oo]. Then any formal series solution y(z) = 
Y^n>oVn zn 01 © converges on the disk \z\ < p. We select a particular solution 
(say, by specifying initial values y(0), . . . , y' r_1 '(0) in some fixed, bounded domain), 
and an evaluation point £ S K with |£| < p. Let denote the bit size of and 
let h = hi + fi2. Again, /12 is allowed to grow to infinity, but we assume that |£| is 
bounded away from p. 

Given p ^ 0, our goal is to compute a complex number u> £ K such that 
\co — y(Q\ ^ 2~ p . By a classical argument, which can be reconstructed by sub- 
stituting a series with indeterminate coefficients into (fT]), the sequence (y n ) obeys 
a recurrence relation of the form 

(2) b (ri)y n+r + bi(n)y n+r -i -\ h b s (n)y n+r - s = Q, bj G K [n] . 

Writing ak{z) = ak^o + afc,i z + ' ' ' + a k,sZ s , the bj are given explicitly by 



(1) («,(,)(,-*) 



H h a 1 (z)z— + a {z) ■ y(z) = 0, 



r 



(3) 




k=0 



Based on the matrix form of the recurrence ([2]), set 




where 



1 



C(n) 



1 

b i(") 1 

b (n)/ 




1 




bsM 

b (n) 



s — 



r zeroes 



r — 1 zeroes 



Let P(a, b) = B(b - 1) • ■ • B(a + l)B(a) for all a ^ b. (In particular, P(a, a) is the 
identity matrix.) 
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Algorithm 1. BinSplit(a, b) 
i If b — a ^ (some threshold) 

2 Return B(b — 1) • • • B(a) where B is defined by ([5]) 
3 else 

4 Return BinSplit(L^J , b) ■ BinSplit(a, [^J) 



One may check that bo(n) ^ for n ^ 0, due to the fact that is an ordinary 
point of (fj]). Thus the computation of a partial sum Sjy = J2n=o VnC n reduces to 
that of the matrix product P(0, N). Indeed, we have 

(Vn+r-sC ■ ■ ■ ,yn+r-iC\S„) T = P(0,n) {y r - s , ■ ■ .,y r -i,0) T 

where y r - s = 0, . . . ,y~i = 0, yo, . . . , y r —\ are easily determined from the initial 
values of the differential equation. 

3. Review of the Classical Binary Splitting Algorithm 

Since the entries of the matrix B(n) are rational functions of n, the bit size of 
P(a, b) grows as 0((6 — a) Igb) when 6, (b — a) — > oo. This bound is sharp in the 
sense that it is reached for some (in fact, most) differential equations. Computing 
P(a, b) as B(b— 1) ■ [B(b — 2) ■ [• ■ ■ -B(a)]] then takes time at least quadratic in b— a, 
as can be seen from the combined size of the intermediate results. The term "binary 
splitting" refers to the technique of reorganizing the product into a balanced tree of 
subproducts, using the relation P(a,b) = P(m,b) ■ P(a,m) with m = \_^(a + b)\ , 
and so on recursively. 

A slight complication stems from the fact that removing common divisors be- 
tween the numerators and denominators of the fractions appearing in the interme- 
diate P(a,b) € K rxr would in general be too expensive. Multiplying the numer- 
ators and denominators separately and doing a single final division yields better 
complexity bounds. Let 

(5) B(n) = b (nKB(n) e min] {s+1)x(s+1 \ C = C'/C (C € % [i] , C € Z). 

The entries of B(n) are polynomials of degree at most r and bit size 0(h). To 
compute P(a,b) by binary splitting, we multiply the B(n) for a ^ n < b using 
Algorithm [T] and then divide the resulting matrix by its bottom right entry. The 
general algorithm considered here was first published by Chudnovsky and Chud- 
novsky [5], with (up to minor details) the analysis summarized in Prop. [TJ The 
idea of binary splitting was known long before [51 Q] . 

Proposition 1. [S] As &, N = b — a, h, r — > oo with r = O(N), Algorithm [1] com- 
putes an unreduced fraction equal to P(a, b) in 0(M.(N(h+rlgb)) IgiV) operations, 
using 0(N(h + r\gb)) bits of memory. Assuming M(n) = n(lgn)(lglgn)°( 1 \ both 
bounds are sharp. 

sketch. The bit sizes of the matrices that get multiplied together at any given 
depth ^ 5 < [IgiV] in the recursive calls are at most C2~ s N(h+d\g b) for some C. 
Since there are at most 2 s such products and the multiplication function M(-) was 
assumed to be subadditive, the contribution of each level is bounded by M(C(b — 
a)(h + d\gb)), whence the total time complexity. See [SJ H7j for details. The 
intermediate results stored or multiplied together at any stage of the computation 
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are disjoint subproducts of B(b — 1) • • • B(a), and we assumed the space complexity 
of n-bit integer multiplication to be 0(n), so the space required by the algorithm is 
linear in the combined size of the B{n). Finally, it is not hard to construct examples 
of differential equations that reach these bounds. □ 

Remark 1. The link between our setting and the more common description of 
the algorithm for hypergeometric series is as follows. In the notation of Haible 
and Pananikolaou [T^] also used in Yakhontov's article, the partial sums of the 
hypergeometric series are related to its defining parameters a, 6, p, q by 




' m. 

8(0 

"(0 gw 

v b(i) q(i) 




g(i) = ^-s(i). 
a(i) 



(G) 



This equation becomes ( *« } ) = 6(i ° (i) ) ( B( * r( ^) l) ) upon clearing de- 

nominators. The standard recursive algorithm for hypergeometric series may be 
seen an "inlined" computation of the associated product tree. Each recursive step 
is equivalent to the computation of the matrix product ( B ^ Pr b °q ) ( B )p ' b ®q i ) . 

We return to the evaluation of a D-finite power series within its disk of conver- 
gence. From the differential equation Jl}, suitable initial conditions, the evaluation 
point £ and a target precision p, one can compute |18j a truncation order N such 
that \S N -y(Q\ «S 2-" and 

N~Kp=(]g(\C\/p))- 1 p, ifp<oc 
N = Q(p/lgp), if p = oo. 

Combined with these estimates, Proposition [1] implies the following. 

Corollary 1. Write £ = h + rlgp. Under the assumptions of Proposition [TJ one 
can compute y(C) m 0(M(£p\gp)) bit operations, using 0(£p) bits of memory. The 
complexity goes down to 0(M(£p)) operations and 0(£p/ \gp) bits of memory when 
a r (z) is a constant. 

This result is the basis of more general evaluation algorithms for D-finite func- 
tions [5] . Indeed, binary splitting can be used to compute the required series sums 
at each step when solving a differential equation of the form (|TJ) by the so-called 
method of Taylor series [IS]. Corollary Q] thus extends to the evaluation of y outside 
the disk \z\ < p. Chudnovsky and Chudnovsky further showed how to reduce the 
cost of evaluation from ft (hp) — fl(p 2 ) to softly linear in p when h = 0(p). This 
last situation is very natural since it covers the case where the point £ is itself 
a 0(p)-digits approximation resulting from a previous computation. The method, 
known as the bit burst algorithm, consists in solving the differential equation along 
a path made of approximations of £ of exponentially increasing precision. Its time 
complexity is 0(M(p(lgp) 2 )) [Hj. The improvements from the next section apply to 
all these settings. See also [23] for an overview of more sophisticated applications. 

4. "Truncated" Binary Splitting 

The superiority of binary splitting over alternatives like summing the series in 
floating-point arithmetic results from the controlled growth of intermediate results. 
Indeed, in the product tree computed by Algorithm [U the exact representations of 
most subproducts P(a, b) are much more compact than 0(p)-digits approximations 
would be. However, as already mentioned, the bit sizes of the P(a,b) also grow 
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larger than p near the root of the tree. The size of a subproduct appearing at 
depth S is roughly 2~ s N(h + rig TV). Assuming N — Q(p), this means that the 
intermediate results get significantly larger than the output in the top 0(lglgp) 
levels of the tree. 

A natural remedy is to use a hybrid of binary splitting and naive summation. 
More precisely, we split the full product P(0,N) into A = 0(lniV) subproducts of 
0(p) bits each, which are computed by binary splitting. The results are accumu- 
lated by successive multiplications at precision 0(p). 

We make use of the following notations to state and analyze the algorithm. In 
Equations ([7]) to (fTI} below, the coefficients of a general matrix A £ (£ fexfc are 
denoted (1 ^ p, q ^ k) with x Pt q,y p .q £ K. Let ||-|| be a 

submultiplicative norm on (D fexfe , and let j3 k > be such that 

(7) \\A\\ ^ /3 k JV(A), AT (A) = max{\x id \, IikjIWj**. 

For defmiteness, assume for now that ||-j| = l)-^ is the matrix norm induced by the 
vector 1-norm. (We will discuss this choice later.) Then it holds that 

k 

(8) AT (A) ^ \\A\l = maxV|a;,-| ^V2kAf(A) 

3 = 1 f-f 



b k (n) 



b Q (n) 



and 

6-1 6-1 , 

(9) ||P( 0| 6)|| < J] ||B(n)|| < [] (l + KI + ICImax 

n—a n—a ^ 

Observe that, since 1 is an eigenvalue of B(n) and the norm ||-|| is assumed to be 
submultiplicative, we have ||P(n)| 1 for all n. Besides, it is clear from §3§ that 
||P(n)|| is bounded. 

Given a £ (Q and e < 1, let 

(10) Tranche) = sgn(a) \¥ \a\\ 2" e , e = [lge -1 ] . 

We have |Trunc(a, e) — a\ ^ e; the size of Trunc(a,e) is 0(lge _1 ) for bounded a; 
and Trunc(a, e) may be computed in 0(M(/i + e)) bit operations where h is the bit 
size of a. We extend the definition to matrices A £ K fexfc by 

(11) Tranche) = (Trunc(a: p ,„ /3^e) + i Trunc(y Pi? , ^ 1 e)) Up 

so that again ||Trunc(^4, e) — A\\ ^ e. Note that we often write expressions of the 
form Trunc(a*&, e) for some operator *. Though this does not affect our complexity 
bounds, it is usually better to compute the approximate value of a*b directly instead 
of starting with an exact computation and truncating the result. See Brent and 
Zimmermann [5] for some relevant algorithms. 

The complete binary splitting algorithm with truncations is stated as Algo- 
rithm [21 Its key properties are summarized in the following propositions. 

Proposition 2. The output P = TruncBinSplit(p) of Algorithm [5] is such that 
||P- P(0,iV)|| < 2~P. 

Proof. Set P(«) = P(0, l^N\) and Q iq) = P (H N 1> l^ N \)- Thcn , for < q < 
A, it holds that 



(12) ||PW -P M \\ < 



1 
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Algorithm 2. TruncBinSplit(p) 

The notation X^ q \ q = 0, 1, . . . refers to a single memory location X at different 
points q of the computation. 

1 Set e = 2-P 

2 Compute N such that \Sm - v(Q\ < e gjgg] 

3 Set A = |"~(/& + rlgiV)] , where /i and r are given following Eq. (JTJ) 

4 Compute M such that max^p 1 ||P(Li-^J , L^zr^J )ll + £ < M < C^ A , where C 
does not depend on p, /i, r [say, by approximating the right-hand side of (JH|) from 
above with O(lgp) bits of precision] 

s Initialize P<o) := jd g k( s+1 ) x ( s+1 ) 
e Forg = 0,l,...,A-l 

7 Q = (4j) := BinSpUt(LiiVj , [^^J) (Algorithm [J) 

s := ltunc((3^ lia+1 • Q, ^M" A + 1 e) 

9 P(« +1 ) := Trunc(Q(«) • ±M- A+q+1 E) 
io Return p( A > 



Indeed, this is true for q = 0. After Step [5] of each loop iteration, we have the 
bound UQfa) - 2sA/- A+1 e e since ||P(n)|| ^ 1 for all rc. Using {JT2J) and 

the inequality HQ^'H ^ M from Step [31 it follows that 

||Q(«)p(9) - Q(9)p(«)||<||Q(«) - g (9) ||||P (,z) || + ||Q (9) ||||P (9) - P (g) || 
< 2g+l £ 
^ 2A M A ~i~ l ' 
After taking into account the truncation error from Step El we obtain 

q+1 e 



p(<?+i) _ p(«+i)|| = ||p(«+i) _ Q(q) p(«)|| ^ 



A Af A -9-i 

which concludes the induction. □ 
Proposition 3. Not counting the cost of Step[H Algorithm [5] runs in time 

f 0(M(p)(/i + rlgp)lgp), ifp<oo, 
[ 1 \ 0(M(p)(ft + rlgp)), ifp = oo, 

as p,h,r — > oo with r = O(lgp) and /i = 0(p). In both cases, it uses O(p) bits 
of memory (where the hidden constant is independent of h and r, under the same 
growth assumptions). 

We neglect the cost of finding N to avoid a lengthy discussion of the complexity 
of the corresponding bound computation algorithms. It could actually be checked 
to be polynomial in r and lgp. 

Proof. Computing the bound M using Equation ^ as suggested is more than 
enough to ensure that lgM — 0(N/A). It requires O(N) arithmetic operations on 
0(lgp)-bit numbers, that is, o(N(\gN) 2 ) bit operations. 
By Proposition [1] each of the A calls to BinSplit requires 

O (M( f (h + r lg N) ) lg N) = O (M(p) lg p) 

bit operations. The resulting matrices all have size O(p), hence the divisions 
from Step[5]can be done in 0(M(p)) operations using Newton's method [251 Chap. 9]. 
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Time Space (classical) Space (trunc.) 



p < oo BinSplit 


0(M(p(/i + rlgp) Igp)) 
0(M(p(lgp) 2 )) 


0(p(fc + rlgp)) 


0(p) 


BitBurst 


0(plgp) 


0(p) 


p = oo BinSplit 


0(M(p(fc + rlgp))) 
0(M(p(lgp) 2 )) 


0(p(r + h/\gp)) 


0(p) 


BitBurst 


0(p) 


0(p) 



Table 1. Complexity of some D-finite function evaluation algo- 
rithms based on binary splitting. The rows labeled "BinSplit" 
summarize the cost of computing a single sum by binary splitting, 
with or without truncations. Those labeled "BitBurst" refer to the 
computation of y(C) by the "bit burst" method, using either of Al- 
gorithm[T]and Algorithm [5] at each step. All entries are asymptotic 
bounds as p, h — >• oo with h = 0(p). In the "BinSplit" case, we also 
let r tend to infinity under the assumption that r = O(lgp). The 
whole point of the "bit burst" method is to get rid the dependency 
on h. 



The truncations in Steps [H] and [5] ensure that the bit sizes of P and Q are always 
at most 

(14) lge- 1 + AlgM + lgA + 0(l) = 0(p). 

It follows that the matrix multiplications from Step [H] take 0(M(p)) operations 
each. Summing up, each iteration of the loop from Step El can be performed in 
0(M(p)lgp) operations, for a total of 0(AM(p)lgp). Equation (flU|) follows upon 
setting N = 0(p) or N = 0(p/lgp) according to ([6|). 

The required memory comprises space for the current values of P^ and Q^ q \ 
any temporary storage used by the operations from Steps [7] to O and an additional 
O(lgp) bits to manipulate auxiliary variables such as M and q. We have seen 
that P(«> and have bit size 0(p). Besides, our assumption that fast integer 
multiplication could be performed in linear space implies the same property for 
division by Newton's method. Thus, Steps [5] and [5] use 0(p) bits of auxiliary storage. 
Finally, again by Proposition [TJ the calls to Algorithm Q] use 0((N/ A)(h + r\gp)) = 
0(p) bits of memory. □ 

Plugging Algorithm [5] into the numerical evaluation algorithms mentioned at 
the end of Sect. yields corresponding improvements for the evaluation of D-finite 
functions at more general points. Table Q] summarizes the complexity bounds we 
obtain. The omitted proofs are direct adaptations of those that apply without 
truncations [51[221[T7]- There would be much to say on the hidden constant factors. 
The main result may be stated more precisely as follows. 

Theorem 1. Let U C (D be a simply connected domain such that £ U and 
a r (z) 7^ for all z £ U . Fix to,... ,£ r -i £ (D and ( £ U. Assume that is an 
ordinary point of H]), and let y be the unique solution of (P) defined on U and such 
that y ik) (0) = 4, < k < r. Then, the value y(Q may be computed with error 
bounded by 2~ p in time 0(M(p)(lgp) 2 ) and space 0(p), not counting the resources 
needed to approximate the £k or £ to precision 0(p) or to find suitable truncation 
orders for the Taylor series involved. 
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Finally, some comments are in order regarding the "working precision", that is, 
the size p' of the entries of P and Q in Algorithm O Equation (fl4|) suggests a 
number of "guard digits" p' — p = O(p). Moreover, if the bound M is computed 
using the hidden constant depends on the choice of ||-||. 

Let .Boo = hm„_ ! . 00 B{n). For the norm ||-|| opt given by Lemma [T] below, we have 

6-1 

lg 11^(0, fc)llopt < E feCPoolU + Oin- 1 )) = 0(lg(6 - a)), 

and hence lg||P(a,6)|| = 0(lg(6 — a)) for any norm ||-||. 

Lemma 1. There exists a matrix norm ||-|| opt such that || Boo 1 1 opt = L 

Proof. We mimic the classical proof of Householder's theorem [201 Sect. 4.2]. By ([5]), 
the limit Coo = lim n _>oo C(n) is the companion matrix of the polynomial z s a r (l/z). 
The eigenvalues of CCoo are strictly smaller than 1 in absolute value since \Q < p. 
Let T be such that r^C^T is in (lower) Jordan normal form. Let A > 0, and 
set n = diag(r, 1) ■ diag(l, A, . . . , X s ). Then n^BooII is lower triangular, with 
off-diagonal entries tending to zero as A — !> 0. Hence we have ||n _1 _BooH|| i = 1 
for A small enough. We choose such a A (e.g., A = 2 max(i |C|) ^ an< ^ set H^Hop* = 
||H "AU\\v ' ^ □ 

One way to eliminate the overestimation in the algorithm is to compute approx- 
imations of the matrices P(l^N\, L^J^-WJ) with O(lgp) digits of precision before 
doing the computation at full precision. One then uses the norms of these approx- 
imate products instead of those of the individual B(n) to determine M. We can 
also explicitly construct an approximation n of the matrix n from the proof of 
Lemma \T\ precise enough that || n -"-SooITl 1 1 = 1, and use the corresponding norm 
instead of in (Compare Algorithm B].) Other options include com- 
puting symbolic bounds on the coefficients of P(a, b) as a function of a and b [TB] 
or finding an explicit integer uq such that n uq =>• ||-B(n)|| opt = 1 based on the 
symbolic expression of n. Which variant to use in practice depends on the features 
of the implementation platform. 

In any case, replacing the O(-) in the space complexity bound by an explicit con- 
stant would also require more specific assumptions on the memory representation 
of the objects we work with, as well as finer control on the space complexity of 
integer multiplication and division (see, e.g., Roche [H)]). 

5. Final Remarks 

What we lose and what we retain. The price we pay for the reduced memory 
usage is the ability to easily extend the computation to higher precision. Indeed, the 
classical algorithm computes the exact value of the matrix P(0, N), from which we 
can deduce P(0, N') for any N' > N in time roughly proportional to N' — N. This 
is no longer true with the linear-space variant. In some "lucky" cases where P(0, N) 
can be represented exactly in linear space, it is possible to get the memory usage 
down to O(N) while preserving restartability: see Cheng et al. [1] and the references 
therein. Additionally, the resulting running time is reportedly lower than using 
truncations, probably owing to the fact that the size of the subproducts in the 
lg(iV/A) lower levels of the tree is reduced as well. Unfortunately, the applicability 
of the technique is limited to very special cases. 
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Two other traditional selling points of the binary splitting method are its easy 
parallelization and good memory locality. Nothing is lost in this respect, except 
that the memory bound grows to 0(t ■ p) when using t = o(lg A) parallel tasks in 
the approximate part of the computation. 

Generalizations. The idea of binary splitting "with truncations" and the outline 
of its analysis adapt to various settings not covered here. For instance, we may con- 
sider systems of linear differential equations instead of scalar equations [5] . Product 
trees of matrices over number fields K' = (Q(a) other than or over rings of 
truncated power series K' [[e\] / (e fc ) are also useful, respectively, to evaluate lim- 
its of D-finitc functions at regular singular points of their defining equations, and 
to make the analytic continuation process more efficient for equations of large or- 
der HZ] • It is not essential either that the coefficients of the recurrence relation 
satisfied by the y n are rational functions of n: all we really ask is that they have 
suitable growth properties and can be computed fast. 

Implementation. We are working on an implementation of the algorithm from 
Sect. U in an experimental branch of the software package NumGfun [16 . The 
current state of the code is available from 

http : / /marc .mezzarobba.net/ supplementary -material/trunc-CASC2012/| 
A comparison (updated periodically) with the implementation of binary splitting 
without truncations used in previous releases of NumGfun is also included. 

Acknowledgments. I would like to thank Nicolas Brisebarre and Bruno Salvy for 
encouraging me to write this note and offering useful comments, and Anne Vaugon 
for proofreading parts of it. 
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